Improving performances of suboptimal greedy iterative biclustering heuristics via localization

نویسندگان

Cesim Erten

Melih Sözdinler

چکیده

MOTIVATION Biclustering gene expression data is the problem of extracting submatrices of genes and conditions exhibiting significant correlation across both the rows and the columns of a data matrix of expression values. Even the simplest versions of the problem are computationally hard. Most of the proposed solutions therefore employ greedy iterative heuristics that locally optimize a suitably assigned scoring function. METHODS We provide a fast and simple pre-processing algorithm called localization that reorders the rows and columns of the input data matrix in such a way as to group correlated entries in small local neighborhoods within the matrix. The proposed localization algorithm takes its roots from effective use of graph-theoretical methods applied to problems exhibiting a similar structure to that of biclustering. In order to evaluate the effectivenesss of the localization pre-processing algorithm, we focus on three representative greedy iterative heuristic methods. We show how the localization pre-processing can be incorporated into each representative algorithm to improve biclustering performance. Furthermore, we propose a simple biclustering algorithm, Random Extraction After Localization (REAL) that randomly extracts submatrices from the localization pre-processed data matrix, eliminates those with low similarity scores, and provides the rest as correlated structures representing biclusters. RESULTS We compare the proposed localization pre-processing with another pre-processing alternative, non-negative matrix factorization. We show that our fast and simple localization procedure provides similar or even better results than the computationally heavy matrix factorization pre-processing with regards to H-value tests. We next demonstrate that the performances of the three representative greedy iterative heuristic methods improve with localization pre-processing when biological correlations in the form of functional enrichment and PPI verification constitute the main performance criteria. The fact that the random extraction method based on localization REAL performs better than the representative greedy heuristic methods under same criteria also confirms the effectiveness of the suggested pre-processing method. AVAILABILITY Supplementary material including code implementations in LEDA C++ library, experimental data, and the results are available at http://code.google.com/p/biclustering/ CONTACTS [email protected]; [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Learning of Probabilistic Context-Free Grammar using Iterative Biclustering (Extended Version)

This paper presents PCFG-BCL, an unsupervised algorithm that learns a probabilistic context-free grammar (PCFG) from positive samples. The algorithm acquires rules of an unknown PCFG through iterative biclustering of bigrams in the training corpus. Our analysis shows that this procedure uses a greedy approach to adding rules such that each set of rules that is added to the grammar results in th...

متن کامل

Unsupervised Learning of Probabilistic Context-Free Grammar using Iterative Biclustering

متن کامل

Evolutionary Biclustering of Clickstream Data

Biclustering is a two way clustering approach involving simultaneous clustering along two dimensions of the data matrix. Finding biclusters of web objects (i.e. web users and web pages) is an emerging topic in the context of web usage mining. It overcomes the problem associated with traditional clustering methods by allowing automatic discovery of browsing pattern based on a subset of attribute...

متن کامل

Deterministic Approach for Biclustering of Co-Regulated Genes from Gene Expression Data

This paper presents an expression pattern based biclustering technique for grouping both positively and negatively regulated genes together as co-regulated genes from microarray expression data. Most interesting variants of this problem are NP-complete requiring either large computational effort or the use of lossy heuristics to short circuit the calculation. Our approach deterministically find...

متن کامل

A New Survey on Biclustering of Microarray Data

There are subsets of genes that have similar behavior under subsets of conditions, so we say that they coexpress, but behave independently under other subsets of conditions. Discovering such coexpressions can be helpful to uncover genomic knowledge such as gene networks or gene interactions. That is why, it is of utmost importance to make a simultaneous clustering of genes and conditions to ide...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Bioinformatics

دوره 26 20 شماره

صفحات -

تاریخ انتشار 2010

Improving performances of suboptimal greedy iterative biclustering heuristics via localization

نویسندگان

چکیده

منابع مشابه

Unsupervised Learning of Probabilistic Context-Free Grammar using Iterative Biclustering (Extended Version)

Unsupervised Learning of Probabilistic Context-Free Grammar using Iterative Biclustering

Evolutionary Biclustering of Clickstream Data

Deterministic Approach for Biclustering of Co-Regulated Genes from Gene Expression Data

A New Survey on Biclustering of Microarray Data

عنوان ژورنال:

اشتراک گذاری